Back

Proteins: Structure, Function, and Bioinformatics

Wiley

All preprints, ranked by how well they match Proteins: Structure, Function, and Bioinformatics's content profile, based on 82 papers previously published here. The average preprint has a 0.03% match score for this journal, so anything above that is already an above-average fit. Older preprints may already have been published elsewhere.

1
AlphaFold-Multimer Modelling of Linked nAChR Subunits Challenges Concatemer Design Assumptions

Sahlstrom, H. M.; Rufener, L.; Horsberg, K. H.; Sarr, A.; Horsberg, T. E.; Bakke, M. J.

2025-10-04 bioinformatics 10.1101/2025.10.02.679753 medRxiv
Top 0.1%
29.1%
Show abstract

Nicotinic acetylcholine receptors (nAChRs) are well described in vertebrates, yet less studied in arthropods, and the subunit stoichiometry of heteromeric arthropod nAChRs remains unresolved. This study combined a computational and experimental approach to predict and validate the stoichiometries of two heteromeric nAChRs from the parasitic arthropod Lepeophtheirus salmonis. AlphaFold2- and AlphaFold-Multimer-based modelling, supported by multiple sequence alignment and functional expression in Xenopus laevis oocytes, identified the most likely stoichiometries for two compositions of subunits previously described, named Lsa-nAChR1 and Lsa-nAChR2. For receptor Lsa-nAChR1, the highest scoring stoichiometry was 1b122b2. Lsa-nAChR2 exhibited three possible stoichiometries, confirmed by both computational modelling and experiments. These are 3{beta}23{beta}1{beta}2, 3{beta}13{beta}1{beta}2, and 333{beta}1{beta}2. All stoichiometries are written from a counterclockwise, extracellular orientation. Strikingly, structural modelling also suggested that linker flexibility in concatemer constructs may allow a novel conformation with a different subunit between the linked subunits, referred to here as "wedging". These results indicate that the use of flexible linker sequences does not reliably enforce subunit position or assembly directionality, as shown here for nAChRs. These findings challenge the assumption that linked concatemers unambiguously dictate receptor stoichiometry. Thus, interpretations of concatemer-based studies, on nAChRs and other receptor systems, may warrant careful reevaluation.

2
Redesigning OmpA Loops Using Canonical Outer Membrane Protein Loop Structures

Franklin, M. W.; Krise, J.; Stevens, J. J.; Slusky, J. S. G.

2020-10-08 biophysics 10.1101/2020.10.08.331546 medRxiv
Top 0.1%
26.0%
Show abstract

Outer membrane proteins are all beta barrels and these barrels have a variety of well-documented loop conformations. Here we test the effect of three different loop types on outer membrane protein A (OmpA) folding. We designed twelve 5-residue loops and experimentally tested the effect of replacing the long loops of outer membrane protein OmpA with the designed loops. Our studies succeeded in creating the smallest known outer membrane barrel. We find that significant changes in OmpA loops do not have a strong overall effect on OmpA folding. However, when decomposing folding into a fast rate and a slow rate we find that changes in loops strongly affect the slow rate of OmpA folding. Extracellular loop types with higher levels of hydrogen bonds had more instances of increasing the slow folding rate and extracellular loop types with low levels of hydrogen bonds had more instances of decreasing the slow folding rate. Having the slow rate affected by loop composition is consistent with the slow rate being associated with the insertion step of outer membrane protein folding.

3
How AlphaFold and related models predict protein-peptide complex structures

Guan, L.; Keating, A. E.

2025-06-24 bioinformatics 10.1101/2025.06.18.660495 medRxiv
Top 0.1%
22.5%
Show abstract

Protein-peptide interactions mediate many biological processes, and access to accurate structural models, through experimental determination or reliable computational prediction, is essential for understanding protein function and designing novel protein-protein interactions. AlphaFold2-Multimer (AF2-Multimer), AlphaFold3 (AF3), and related models such as Boltz-1 and Chai-1 are state-of-the-art protein structure predictors that successfully predict protein-peptide complex structures. Using a dataset of experimentally resolved protein-peptide structures, we analyzed the performance of these four structure prediction models to understand how they work. We found evidence of bias for previously seen structures, suggesting that models may struggle to generalize to novel target proteins or binding sites. We probed how models use the protein and peptide multiple sequence alignments (MSAs), which are often shallow or of poor quality for peptide sequences. We found weak evidence that models use coevolutionary information from paired MSAs and found that both the target and peptide unpaired MSAs contribute to performance. Our work highlights the promise of deep learning for peptide docking and the importance of diverse representation of interface geometries in the training data for optimal prediction performance.

4
Consensus Finder web tool to predict stabilizing substitutions in proteins

Jones, B. J.; Kan, C. N. E.; Luo, C.; Kazlauskas, R.

2020-06-30 bioengineering 10.1101/2020.06.29.178418 medRxiv
Top 0.1%
22.2%
Show abstract

The consensus sequence approach to predicting stabilizing substitutions in proteins rests on the notion that conserved amino acids are more likely to contribute to the stability of a protein fold than non-conserved amino acids. To implement a prediction for a target protein sequence, one finds homologous sequences and aligns them in a multiple sequence alignment. The sequence of the most frequently occurring amino acid at each position is the consensus sequence. Replacement of a rarely occurring amino acid in the target with a frequently occurring amino acid is predicted to be stabilizing. Consensus Finder is an open-source web tool that automates this prediction. This chapter reviews the rationale for the consensus sequence approach and explains the options for fine-tuning this approach using Staphylococcus nuclease A as an example.Competing Interest StatementThe authors have declared no competing interest.View Full Text

5
A common network of residue-residue contacts underlies peptides' interactions with MHC class II complex

Kister, A. E.; Kister, I.

2025-03-25 immunology 10.1101/2025.03.22.644772 medRxiv
Top 0.1%
22.0%
Show abstract

The formation of a stable peptide-MHC class II complex is a critical step in the adaptive immune response. In this work, we investigate the residue-residue contacts that anchor the peptide between the alpha and beta chains of MHC II and examine whether such anchoring residue-residue contacts are shared among different peptide-MHC II complexes. We hypothesize that there is a similarity between the map of contacts of antigenic peptides with the alpha and beta chains of MHC II and the map of contacts of the "natural" complex of MHC II with the CLIP - the fragment of the gamma chain. Thus, the CLIP-MHC II complex - specifically, PDB structure 3PDO - was taken as the prototype for peptide-MHC II interaction. To compare the contact maps between the prototype structure and antigenic peptides/MHC II in 14 crystal structures, we developed a unified numbering system for residues in peptide-MHC II complexes. Using this unified residue numbering system, we show that approximately half of the CLIP-MHC II residue-residue contacts have analogs in structures that involve different antigenic peptides and different MHC II (HLA-DR, HLA-DQ, and mouse A/B) alpha and beta chains. We present here this common network of contacts that underlies peptide/MHC class II interactions, as well as the structural and physicochemical characteristics of these contacts. Based on these shared characteristics, we propose criteria for the specificity of antigenic peptide loading into MHC II, whereby one can predict whether a particular peptide fragment will bind to MHC II as well as the likely localization of the fragment within the peptide binding groove of MHC II.

6
Molecular Determinants of μ-Conotoxin KIIIA interaction with the Voltage-Gated Sodium Channel Nav1.7

Kimball, I. H.; Nguyen, P. T.; Olivera, B. M.; Sack, J. T.; Yarov-Yarovoy, V.

2019-06-20 biophysics 10.1101/654889 medRxiv
Top 0.1%
18.3%
Show abstract

The voltage-gated sodium (NaV) channel subtype NaV1.7 plays a critical role in pain signaling, making it an important drug target. Here we studied the molecular interactions between -conotoxin KIIIA (KIIIA) and the human NaV1.7 channel (hNaV1.7). We developed a structural model of hNaV1.7 using Rosetta computational modeling and performed in silico docking of KIIIA using RosettaDock to predict residues forming specific pairwise contacts between KIIIA and hNaV1.7. We experimentally validated these contacts using mutant cycle analysis. Comparison between our KIIIA-hNaV1.7 model and the cryo-EM structure of KIIIA-hNaV1.2 revealed key similarities and differences between NaV channel subtypes with potential implications for the molecular mechanism of toxin block. The accuracy of our integrative approach, combining structural data with computational modeling, experimental validation, and molecular dynamics simulations, suggests that Rosetta structural predictions will be useful for rational design of novel biologics targeting specific NaV channels.

7
Subunit epsilon of E. coli F1Fo ATP synthase attenuates enzyme activity by modulating central stalk flexibility

Sobti, M.; Walshe, J. L.; Zeng, Y. C.; Ishmukhametov, R.; Stewart, A. G.

2020-09-30 biophysics 10.1101/2020.09.30.320408 medRxiv
Top 0.1%
18.2%
Show abstract

F1Fo ATP synthase functions as a biological rotary generator that makes a major contribution to cellular energy production. Proton flow through the Fo motor generates rotation of the central stalk, inducing conformational changes in the F1 motor that catalyzes ATP production via flexible coupling. Here we present a range of cryo-EM structures of E. coli ATP synthase in different rotational and inhibited states observed following a 45 second incubation with 10 mM MgATP. The structures generated describe multiple changes that occur following addition of MgATP, with the inhibitory C-terminal domain of subunit {varepsilon} ({varepsilon}CTD) disassociating from the central stalk to adopt a condensed "down" conformation. The transition to the {varepsilon}CTD down state increases the torsional flexibility of the central stalk allowing its foot to rotate by [~]50{degrees}, with further flexing in the peripheral stalk enabling the c-ring to rotate by two sub-steps in the Fo motor. Truncation mutants lacking the second helix of the {varepsilon}CTD suggest that central stalk rotational flexibility is important for F1Fo ATP synthase function. Overall this study identifies the potential role played by torsional flexing within the rotor and how this could be influenced by the {varepsilon} subunit.

8
Alternating handedness motifs in proteins classify structure and cofactor binding

Rizwan, S.; Pike, D.; Poudel, S.; Nanda, V.

2020-11-20 bioinformatics 10.1101/2020.11.17.367490 medRxiv
Top 0.1%
16.9%
Show abstract

Cofactor binding sites in proteins often are composed of favorable interactions of specific cofactors with the sidechains and/or backbone protein fold motifs. In many cases these motifs contain left-handed conformations which enable tight turns of the backbone that present backbone amide protons in direct interactions with cofactors termed cationic nests. Here, we defined alternating handedness of secondary structure as a search constraint within the PDB to systematically identify these cofactor binding nests. We identify unique alternating handedness structural motifs which are specific to the cofactors they bind. These motifs can guide the design of engineered folds that utilize specific cofactors and also enable us to gain a deeper insight into the evolution of the structure of cofactor binding sites.

9
Petascale Homology Search for Structure Prediction

Lee, S.; Kim, G.; Levy Karin, E.; Mirdita, M.; Park, S.; Chikhi, R.; Babaian, A.; Kryshtafovych, A.; Steinegger, M.

2023-07-11 bioinformatics 10.1101/2023.07.10.548308 medRxiv
Top 0.1%
16.9%
Show abstract

The recent CASP15 competition highlighted the critical role of multiple sequence alignments (MSAs) in protein structure prediction, as demonstrated by the success of the top AlphaFold2-based prediction methods. To push the boundaries of MSA utilization, we conducted a petabase-scale search of the Sequence Read Archive (SRA), resulting in gigabytes of aligned homologs for CASP15 targets. These were merged with default MSAs produced by ColabFold-search and provided to ColabFold-predict. By using SRA data, we achieved highly accurate predictions (GDT_TS > 70) for 66% of the non-easy targets, whereas using ColabFold-search default MSAs scored highly in only 52%. Next, we tested the effect of deep homology search and ColabFolds advanced features, such as more recycles, on prediction accuracy. While SRA homologs were most significant for improving ColabFolds CASP15 ranking from 11th to 3rd place, other strategies contributed too. We analyze these in the context of existing strategies to improve prediction.

10
Assessment of Protein Complex Predictions in CASP16: Are we making progress?

Zhang, J.; Yuan, R.; Kryshtafovych, A.; Kretsch, R. C.; Schaeffer, R. D.; Zhou, J.; Das, R.; Grishin, N. V.; Cong, Q.

2025-05-30 biophysics 10.1101/2025.05.29.656875 medRxiv
Top 0.1%
16.7%
Show abstract

The assessment of oligomer targets in the Critical Assessment of Structure Prediction Round 16 (CASP16) suggests that complex structure prediction remains an unsolved challenge. More than 30% of targets, particularly antibody-antigen targets, were highly challenging, with each group correctly predicting structures for only about a quarter of such targets. Most CASP16 groups relied on AlphaFold-Multimer (AFM) or AlphaFold3 (AF3) as their core modeling engines. By optimizing input MSAs, refining modeling constructs (using partial rather than full sequences), and employing massive model sampling and selection, top-performing groups were able to significantly outperform the default AFM/AF3 predictions. CASP16 also introduced two additional challenges: Phase 0, which required predictions without stoichiometry information, and Phase 2, which provided participants with thousands of models generated by MassiveFold (MF) to enable large-scale sampling for resource-limited groups. Across all phases, the MULTICOM series and Kiharalab emerged as top performers based on the quality of their best models per target. However, these groups did not have a strong advantage in model ranking, and thus their lead over other teams, such as Yang-Multimer and kozakovvajda, was less pronounced when evaluating only the first submitted models. Compared to CASP15, CASP16 showed moderate overall improvement, likely driven by the release of AF3 and the extensive model sampling employed by top groups. Several notable trends highlight key frontiers for future development. First, the kozakovvajda group significantly outperformed others on antibody-antigen targets, achieving over a 60% success rate without relying on AFM or AF3 as their primary modeling framework, suggesting that alternative approaches may offer promising solutions for these difficult targets. Second, model ranking and selection continue to be major bottlenecks. The PEZYFoldings group demonstrated a notable advantage in selecting their best models as first models, suggesting that their pipeline for model ranking may offer important insights for the field. Finally, the Phase 0 experiment indicated reasonable success in stoichiometry prediction; however, stoichiometry prediction remains challenging for high-order assemblies and targets that differ from available homologous templates. Overall, CASP16 demonstrated steady progress in multimer prediction while emphasizing the urgent need for more effective model ranking strategies, improved stoichiometry prediction, and the development of new modeling methods that extend beyond the current AF-based paradigm.

11
Optimization of the prostaglandin F2α receptor for structural biology

Salze, M.; Chretien, S.; Boora, T.; Macovei, M.; Barbeau, E.; Blais, V.; Laporte, S.; Audet, M.

2025-02-20 pharmacology and toxicology 10.1101/2025.02.15.638479 medRxiv
Top 0.1%
14.8%
Show abstract

Prostaglandin F2[a] (PGF2[a]) is a bioactive lipid derived from arachidonic acid and is involved in many physiological and pathophysiological processes such as parturition, vascular tone regulation, glaucoma and inflammation. It acts by binding to the Prostaglandin F2[a] receptor (FP), a G Protein-Coupled Receptor (GPCR) that mediates signaling events by engaging intracellular heterotrimeric G protein effectors. The orthosteric binding site of lipid-binding receptors displays greater efficacy-dependent plasticity that hinders the design of ligands. Solving the structure of FP with ligands of different efficacies at an atomic level is important to fully understand its mechanism of activation and inhibition. Most purified FP-ligand complexes are unstable in vitro. The development of new X-ray crystallography and single particle cryo-electron (cryoEM) strategies to understand receptors signal transduction requires improved purification yield and in vitro stability of the receptor. Here, we present a protein engineering effort to optimize the FP protein sequence for use in structural biology. Strategies involve protein insertion sites in the third intracellular loop (ICL3), N-terminal and C-terminal deletions, and single-point mutations that favorably affect receptor purification yield and stability in vitro. The best FP construct displays a yield of 1.5 mg/L and a stability of 59{degrees}C that constitute a threefold improvement in purification yield and 9{degrees}C increase in stability over the wild-type receptor. These modifications in the receptor are suitable for pursuing alternative strategies for improving FP purification yield and for studying FP binding efficacy to its ligands through structural biology approaches.

12
Structure of heme d1-free cd1 nitrite reductase NirS

Kluenemann, T.; Blankenfeldt, W.

2020-02-13 biochemistry 10.1101/2020.02.12.945543 medRxiv
Top 0.1%
14.4%
Show abstract

A key step in anaerobic nitrate respiration is the reduction of nitrite to nitric oxide, which is catalysed by cd1 nitrite reductase NirS in e.g. the gram-negative opportunistic pathogen Pseudomonas aeruginosa. Each subunit of this homodimeric enzyme consists of a cytochrome c domain and an eight-bladed {beta}-propeller that binds the uncommon isobacteriochlorin heme d1 as an essential part of its active site. Although NirS is mechanistically and structurally well studied, the focus of previous studies has been on the active, heme d1-bound form. The heme d1-free form of NirS reported here, representing a premature state of the reductase, adopts an open conformation with the cytochrome c domains moved away from each other with respect to the active enzyme. Further, movement of a loop around W498 seems to be related to a widening of the propeller, allowing easier access to the heme d1 binding side. Finally, a possible link between the open conformation of NirS and flagella formation in P. aeruginosa is discussed. SynopsisThe crystal structure of heme d1-free cd1 nitrite reductase NirS from Pseudomonas aeruginosa has been determined and provides insight into a premature form of the enzyme.

13
Structural analysis of Helicobacter pylori glutamate racemase in a monoclinic crystal form

Spiliopoulou, M.; Schulz, E. C.

2026-04-03 biochemistry 10.64898/2026.04.02.716094 medRxiv
Top 0.1%
14.4%
Show abstract

Glutamate racemase (MurI) catalyzes the stereochemical interconversion of L-glutamate to D-glutamate, a key element of bacterial peptidoglycan biosynthesis. In this study, we present the crystal structure of Helicobacter pylori glutamate racemase at 1.43 [A] and in monoclinic symmetry, as previously reported models, but different unit-cell parameters. The present model contains a single dimer and retains the previously described head-to-head dimer arrangement. The differences between the models arise from variations in unit-cell parameters, which lead to altered crystal packing interactions rather than changes in the quaternary assembly. The monomeric fold and active-site architecture remain conserved and are consistent with the catalytic features described for bacterial glutamate racemases. This structure provides an updated, high-resolution structural model for H. pylori glutamate racemase and highlights the variability in crystal packing within the same space group.

14
De novo protein fold families expand the designable ligand binding site space

Pan, X.; Kortemme, T.

2021-01-15 biophysics 10.1101/2021.01.13.426598 medRxiv
Top 0.1%
14.3%
Show abstract

A major challenge in designing proteins de novo to bind user-defined ligands with high specificity and affinity is finding backbones structures that can accommodate a desired binding site geometry with high precision. Recent advances in methods to generate protein fold families de novo have expanded the space of accessible protein structures, but it is not clear to what extend de novo proteins with diverse geometries also expand the space of designable ligand binding functions. We constructed a library of 25,806 high-quality ligand binding sites and developed a fast protocol to place ("match") these binding sites into both naturally occurring and de novo protein families with two fold topologies: Rossman and NTF2. 5,896 and 7,475 binding sites could be matched to the Rossmann and NTF2 fold families, respectively. De novo designed Rossman and NTF2 protein families can support 1,791 and 678 binding sites that cannot be matched to naturally existing structures with the same topologies, respectively. While the number of protein residues in ligand binding sites is the major determinant of matching success, ligand size and primary sequence separation of binding site residues also play important roles. The number of matched binding sites are power law functions of the number of members in a fold family. Our results suggest that de novo sampling of geometric variations on diverse fold topologies can significantly expand the space of designable ligand binding sites for a wealth of possible new protein functions. Author summaryDe novo design of proteins that can bind to novel and highly diverse user-defined small molecule ligands could have broad biomedical and synthetic biology applications. Because ligand binding site geometries need to be accommodated by protein backbone scaffolds at high accuracy, the diversity of scaffolds is a major limitation for designing new ligand binding functions. Advances in computational protein structure design methods have significantly increased the number of accessible stable scaffold structures. Understanding how many new ligand binding sites can be accommodated by the de novo scaffolds is important for designing novel ligand binding proteins. To answer this question, we constructed a large library of ligand binding sites from the Protein Data Bank (PDB). We tested the number of ligand binding sites that can be accommodated by de novo scaffolds and naturally existing scaffolds with same fold topologies. The results showed that de novo scaffolds significantly expanded the ligand binding space of their respective fold topologies. We also identified factors that affect difficulties of binding site accommodation, as well as the relationship between the number of scaffolds and the accessible ligand binding site space. We believe our findings will benefit future method development and applications of ligand binding protein design.

15
NMR structure of the Orf63 pro-lytic protein from lambda bacteriophage

Khan, N.; Graham, T.; Franciszkiewicz, K.; Bloch, S.; Nejman-Falenczyk, B.; Wegrzyn, A.; Donaldson, L. W.

2023-10-03 biochemistry 10.1101/2023.10.03.560691 medRxiv
Top 0.1%
14.3%
Show abstract

The orf63 gene resides in a region of the lambda bacteriophage genome between the exo and xis genes and is among the earliest genes transcribed during infection. In lambda phage and Shiga toxin (Stx) producing phages found in enterohemorrhagic E. coli (EHEC) associated with food poisoning, Orf63 expression reduces the host survival and hastens the period between infection and lysis thereby giving it pro-lysogenic qualities. The NMR structure of dimeric Orf63 reveals a fold consisting of two helices and one strand that all make extensive intermolecular contacts. Structure-based data mining failed to identify any Orf63 homolog beyond the family of temperate bacteriophages. A machine learning approach was used to design an amphipathic helical ligand that bound a hydrophobic cleft on Orf63. This approach may open a new path towards designing therapeutics that antagonize the contributions of Stx phages in EHEC outbreaks.

16
AI-first structural identification of pathogenic protein targets

Saluri, M.; Landreh, M.; Bryant, P.

2024-12-16 bioinformatics 10.1101/2024.12.12.628104 medRxiv
Top 0.1%
14.2%
Show abstract

The likelihood for pandemics is increasing as the world population grows and becomes more interconnected. Obtaining structural knowledge of protein-protein interactions between a pathogen and its host can inform pathogenic mechanisms and treatment or vaccine design. Currently, there are 52 nonredundant human-pathogen interactions with known structure in the PDB, although there are 21064 with experimental support in the HPIDB, meaning that only 0.2% of known interactions have known structure. Recent improvements in structure prediction of protein complexes based on AlphaFold have made it possible to model heterodimeric complexes with very high accuracy. However, it is not known how this translates to host-pathogen interactions which share a different evolutionary relationship. Here, we analyse the structural protein-protein interaction network between ten different pathogens and their human host. We predict the structure of 9452 human-pathogen interactions of which only 10 have known structure. We find that we can model 30 interactions with an expected TM-score of [≥]0.9, expanding the structural knowledge in these networks three-fold. We select the highly-scoring Francisella tularensis dihydroprolyl dehydrogenase (IPD) complex with human immunoglobulin Kappa constant (IGKC) for detailed analysis with homology modeling and native mass spectrometry. Our results confirm the predicted 1:2:1 heterotetrameric complex with potential implications for bacterial immune response evasion. We are entering a new era where structure prediction can be used to guide vaccine and drug development towards new pathogenic targets in very short time frames.

17
Conservation of the hydrogen-bond network in bacterial response regulators

Hamid, M.; Jabeen, I.; Chaudhary, S. U.; Khan, S. M.

2025-09-16 biophysics 10.1101/2025.09.13.674700 medRxiv
Top 0.1%
14.0%
Show abstract

The bacterial response regulator (RR) superfamily is activated by single aspartyl phosphorylation to modulate a distant target binding surface for diverse functions. The enteric CheY RRs, which represent the chemotaxis subfamily, have been extensively characterized. Their native, chemical or genetically-altered crystal structures have revealed an essential role for water-mediated hydrogen bonds (H-bonds) in activation. Here, we use molecular dynamics (MD) to compare the protein-water H-bond network in basal and in-silico phosphorylated conformations. We supplement the MD with energy frustration profiles for atomic structures and models from selected RR superfamily representatives. The energetically frustrated phosphorylation pocket consists of the conserved aspartate triad for phosphorylation, plus associated structural waters and residues for Mg2+ ion coordination. It orchestrates the H-bond network characterized here in atomic detail. The network has an energetically stable core. Its plastic nodes switch bonding states coupled to loop flexibility and sidechain rotations. Mutual information revealsthat the long-range, dynamic networks respond to single H-bond transitions. The network centrality of the phosphorylation pocket, connected to the target binding surface by water-mediated channels via the conserved switch residues (T87, K109), increases upon phosphorylation. Analysis of other RR representatives suggests this design is a generic feature of RR allostery with subtle, function-dependent differences. The water contribution may prove critical for the design of specific RR sub-family specific, allosteric inhibitors.

18
"Multiplex" rheostat positions cluster around allosterically critical regions of the lactose repressor protein

Bantis, L. E.; Parente, D. J.; Fenton, A. W.; Swint-Kruse, L.

2020-11-17 biochemistry 10.1101/2020.11.17.386979 medRxiv
Top 0.1%
13.9%
Show abstract

Amino acid variation at "rheostat" positions provides opportunity to modulate various aspects of protein function - such as binding affinity or allosteric coupling - across a wide range. Previously a subclass of "multiplex" rheostat positions was identified at which substitutions simultaneously modulated more than one functional parameter. Using the Miller laboratorys dataset of [~]4000 variants of lactose repressor protein (LacI), we compared the structural properties of multiplex rheostat positions with (i) "single" rheostat positions that modulate only one functional parameter, (ii) "toggle" positions that follow textbook substitution rules, and (iii) "neutral" positions that tolerate any substitution without changing function. The combined rheostat classes comprised >40% of LacI positions, more than either toggle or neutral positions. Single rheostat positions were broadly distributed over the structure. Multiplex rheostat positions structurally overlapped with positions involved in allosteric regulation. When their phenotypic outcomes were interpreted within a thermodynamic framework, functional changes at multiplex positions were uncorrelated. This suggests that substitutions lead to complex changes in the underlying molecular biophysics. Bivariable and multivariable analyses of evolutionary signals within multiple sequence alignments could not differentiate single and multiplex rheostat positions. Phylogenetic analyses - such as ConSurf - could distinguish rheostats from toggle and neutral positions. Multivariable analyses could also identify a subset of neutral positions with high probability. Taken together, these results suggest that detailed understanding of the underlying molecular biophysics, likely including protein dynamics, will be required to discriminate single and multiplex rheostat positions from each other and to predict substitution outcomes at these sites.

19
A short commentary on indents and edges of β-sheets

Khare, H.; Ramakumar, S.

2019-11-21 bioinformatics 10.1101/850982 medRxiv
Top 0.1%
12.5%
Show abstract

{beta}-sheets in proteins are formed by extended polypeptide chains, called {beta}-strands. While there is a general consensus on two types of {beta}-strands, viz. edge strands (or edges) and inner strands (or central strands), the possibility of distinguishing between different regions of inner strands remains less explored. In this paper, we address the portions of inner strands of {beta}-sheets that stick out on either or both sides. We call these portions the indent strands or indents because they give the typical indented appearance to {beta}-sheets. Similar to the edge strands, the indent strands also have {beta}-bridge partner residues on one side while the other side is still open for backbone hydrogen bonds. Despite this similarity, the indent strands differ from the edge strands in terms of various properties such as {beta}-bulges and amino acid composition due to their localization within {beta}-sheets and therefore within folded proteins to certain extent. The localization of indents and edges within folded proteins seems to govern the strategies deployed to deter unhindered {beta}-sheet propagation through {beta}-strand stacking interactions. Our findings suggest that, edges and indents differ in their strategies to avoid further {beta}-strand stacking. Short length itself is a good strategy to avoid stacking and a majority of indents are two residue or shorter in length. Edge strands on the other hand are overall longer. While long edges are known to use various negative design strategies like {beta}-bulges, prolines, strategically placed charges, inward-pointing charged side chains and loop coverage to avoid further {beta}-strand stacking, long indents seem to favor mechanisms such as enrichment in flexible residues with high solvation potential and depletion in hydrophobic residues in response to their less solvent exposed nature. Such subtle differences between indents and edges could be leveraged for designing novel {beta}-sheet architectures.

20
A Graph-Directed Approach for Creation of a Homology Modeling Library: Application to Venom Structure Prediction

Mansbach, R. A.; Chakraborty, S.; Travers, T.; Gnanakaran, S.

2019-11-01 bioinformatics 10.1101/828129 medRxiv
Top 0.1%
12.5%
Show abstract

Many toxins are short, cysteine-rich peptides that are of great interest as novel therapeutic leads and of great concern as lethal biological agents due to their high affinity and specificity for various receptors involved in neuromuscular transmission. To perform initial candidate identification for design of a drug impacting a particular receptor or for threat assessment as a harmful toxin, one requires a set of candidate structures of reasonable accuracy with potential for interaction with the target receptor. In this article, we introduce a graph-based algorithm for identifying good extant template structures from a library of evolutionarily-related cysteine-containing sequences for structural determination of target sequences by homology modeling. We employ this approach to study the conotoxins, a set of toxin peptides produced by the family of aquatic cone snails. Currently, of the approximately six thousand known conotoxin sequences, only about three percent have experimentally characterized three-dimensional structures, leading to a serious bottleneck in identifying potential drug candidates. We demonstrate that the conotoxin template library generated by our approach may be employed to perform homology modeling and greatly increase the number of characterized conotoxin structures. We also show how our approach can guide experimental design by identifying and ranking sequences for structural characterization in a similar manner. Overall, we present and validate an approach for venom structure modeling and employ it to expand the library of extant conotoxin structures by almost 300% through homology modeling employing the template library determined in our approach.